GlobalPhone: A Multilingual Text & Speech Database in 20 Languages

نویسندگان

  • Tanja Schultz
  • Ngoc Thang Vu
  • Tim Schlippe
چکیده

This paper describes the advances in the multilingual text and speech database GlobalPhone, a multilingual database of highquality read speech with corresponding transcriptions and pronunciation dictionaries in 20 languages. GlobalPhone was designed to be uniform across languages with respect to the amount of data, speech quality, the collection scenario, the transcription and phone set conventions. With more than 400 hours of transcribed audio data from more than 2000 native speakers GlobalPhone supplies an excellent basis for research in the areas of multilingual speech recognition, rapid deployment of speech processing systems to yet unsupported languages, language identification tasks, speaker recognition in multiple languages, multilingual speech synthesis, as well as monolingual speech recognition in a large variety of languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GlobalPhone: Pronunciation Dictionaries in 20 Languages

This paper describes the advances in the multilingual text and speech database GLOBALPHONE a multilingual database of high-quality read speech with corresponding transcriptions and pronunciation dictionaries in 20 languages. GLOBALPHONE was designed to be uniform across languages with respect to the amount of data, speech quality, the collection scenario, the transcription and phone set convent...

متن کامل

Globalphone: a Multilingual Spee Developed at Karlsruhe

This paper describes the design, collection, and current status of the multilingual database GlobalPhone, an ongoing project since 1995 at Karlsruhe University. GlobalPhone is a highquality read speech and text database in a large variety of languages which is suitable for the development of large vocabulary speech recognition systems in many languages. It has already been successfully applied ...

متن کامل

The GlobalPhone Project: Multilingual LVCSR with JANUS-3

This paper describes our recent e ort in developing the GlobalPhone database for multilingual large vocabulary continuous speech recognition. In particular we present the current status of the GlobalPhone corpus containing high quality speech data for the 9 languages Arabic, Chinese, Croatic, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish. We also discuss the JANUS-3 toolkit and ho...

متن کامل

Multilingual and Crosslingual Speech Recognition

This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...

متن کامل

Fast bootstrapping of LVCSR systems with multilingual phoneme sets

In this paper we described an e cient method to bootstrap continuously spoken, large vocabulary speech recognition systems by multilingual phoneme sets. To evaluate this techniques we collected the multilingual database GlobalPhone which currently consists of 9 di erent languages. A multilingual recognizer (MULTI) based on the four languages German, English, Japanese and Spanish was developed t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013